A two-parameter generalized Poisson model to improve the analysis of RNA-seq data
نویسندگان
چکیده
Deep sequencing of RNAs (RNA-seq) has been a useful tool to characterize and quantify transcriptomes. However, there are significant challenges in the analysis of RNA-seq data, such as how to separate signals from sequencing bias and how to perform reasonable normalization. Here, we focus on a fundamental question in RNA-seq analysis: the distribution of the position-level read counts. Specifically, we propose a two-parameter generalized Poisson (GP) model to the position-level read counts. We show that the GP model fits the data much better than the traditional Poisson model. Based on the GP model, we can better estimate gene or exon expression, perform a more reasonable normalization across different samples, and improve the identification of differentially expressed genes and the identification of differentially spliced exons. The usefulness of the GP model is demonstrated by applications to multiple RNA-seq data sets.
منابع مشابه
Bayesian paradigm for analysing count data in longitudina studies using Poisson-generalized log-gamma model
In analyzing longitudinal data with counted responses, normal distribution is usually used for distribution of the random efffects. However, in some applications random effects may not be normally distributed. Misspecification of this distribution may cause reduction of efficiency of estimators. In this paper, a generalized log-gamma distribution is used for the random effects which includes th...
متن کاملکاربرد مدل رگرسیون پواسنی تعمیم یافته در تحلیل دادههای باروری زنان روستایی استان فارس
Background & objectives: statistical modeling explicates the observed changes in data by means of mathematics equations. In cases that dependent variable is count, Poisson model is applied. If Poisson model is not applicable in a specific situation, it is better to apply the generalized Poisson model. So, our emphasis in this study is to notice the data structure, introducing the generalized Po...
متن کاملSample Size Calculation of RNA-sequencing Experiment-A Simulation-Based Approach of TCGA Data
Power and sample size calculation is an essential component of experimental design in biomedical research. For RNA-sequencing experiments, sample size calculations have been proposed based on mathematical models such as Poisson and negative binomial; however, RNA-seq data has exhibited variations, i.e. over-dispersion, that has caused past calculation methods to be underor over-power. Because o...
متن کاملThe Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models
Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...
متن کاملمقایسه کارایی مدلهای رگرسیون پواسن تعمیم یافته با رگرسیون پواسن استاندارد در تحلیل رفتار باروری زنان شهر کاشان درسال 1391
Introduction: Different statistical methods can be used to analyze fertility data. When the response variable is discrete, Poisson model is applied. If the condition does not hold for the Poisson model, its generalized model will be applied. The goal of this study was to compare the efficiency of generalized Poisson regression model with the standard Poisson regression model in estimating the c...
متن کامل